22 research outputs found

    Hand gesture recognition with jointly calibrated Leap Motion and depth sensor

    Get PDF
    Novel 3D acquisition devices like depth cameras and the Leap Motion have recently reached the market. Depth cameras allow to obtain a complete 3D description of the framed scene while the Leap Motion sensor is a device explicitly targeted for hand gesture recognition and provides only a limited set of relevant points. This paper shows how to jointly exploit the two types of sensors for accurate gesture recognition. An ad-hoc solution for the joint calibration of the two devices is firstly presented. Then a set of novel feature descriptors is introduced both for the Leap Motion and for depth data. Various schemes based on the distances of the hand samples from the centroid, on the curvature of the hand contour and on the convex hull of the hand shape are employed and the use of Leap Motion data to aid feature extraction is also considered. The proposed feature sets are fed to two different classifiers, one based on multi-class SVMs and one exploiting Random Forests. Different feature selection algorithms have also been tested in order to reduce the complexity of the approach. Experimental results show that a very high accuracy can be obtained from the proposed method. The current implementation is also able to run in real-time

    Real-time hand gesture recognition exploiting multiple 2D and 3D cues

    Get PDF
    The recent introduction of several 3D applications and stereoscopic display technologies has created the necessity of novel human-machine interfaces. The traditional input devices, such as keyboard and mouse, are not able to fully exploit the potential of these interfaces and do not offer a natural interaction. Hand gestures provide, instead, a more natural and sometimes safer way of interacting with computers and other machines without touching them. The use cases for gesture-based interfaces range from gaming to automatic sign language interpretation, health care, robotics, and vehicle automation. Automatic gesture recognition is a challenging problem that has been attaining a growing interest in the research field for several years due to its applications in natural interfaces. The first approaches, based on the recognition from 2D color pictures or video only, suffered of the typical problems characterizing such type of data. Inter occlusions, different skin colors among users even of the same ethnic group and unstable illumination conditions, in facts, often made this problem intractable. Other approaches, instead, solved the previous problems by making the user wear sensorized gloves or hold proper tools designed to help the hand localization in the scene. The recent introduction in the mass market of novel low-cost range cameras, like the Microsoft Kinect, Asus XTION, Creative Senz3D, and the Leap Motion, has opened the way to innovative gesture recognition approaches exploiting the geometry of the framed scene. Most methods share a common gesture recognition pipeline based on firstly identifying the hand in the framed scene, then extracting some relevant features on the hand samples and finally exploiting suitable machine learning techniques in order to recognize the performed gesture from a predefined ``gesture dictionary''. This thesis, based on the previous rationale, proposes a novel gesture recognition framework exploiting both color and geometric cues from low-cost color and range cameras. The dissertation starts by introducing the automatic hand gesture recognition problem, giving an overview of the state-of-art algorithms and the recognition pipeline employed in this work. Then, it briefly describes the major low-cost range cameras and setups used in literature for color and depth data acquisition for hand gesture recognition purposes, highlighting their capabilities and limitations. The methods employed for respectively detecting the hand in the framed scene and segmenting it in its relevant parts are then analyzed with a higher level of detail. The algorithm first exploits skin color information and geometrical considerations for discarding the background samples, then it reliably detects the palm and the finger regions, and removes the forearm. For the palm detection, the method fits the largest circle inscribed in the palm region or, in a more advanced version, an ellipse. A set of robust color and geometric features which can be extracted from the fingers and palm regions, previously segmented, is then illustrated accurately. Geometric features describe properties of the hand contour from its curvature variations, the distances in the 3D space or in the image plane of its points from the hand center or from the palm, or extract relevant information from the palm morphology and from the empty space in the hand convex hull. Color features exploit, instead, the histogram of oriented gradients (HOG), local phase quantization (LPQ) and local ternary patterns (LTP) algorithms to provide further helpful cues from the hand texture and the depth map treated as a grayscale image. Additional features extracted from the Leap Motion data complete the gesture characterization for a more reliable recognition. Moreover, the thesis also reports a novel approach jointly exploiting the geometric data provided by the Leap Motion and the depth data from a range camera for extracting the same depth features with a significantly lower computational effort. This work then addresses the delicate problem of constructing a robust gesture recognition model from the features previously described, using multi-class Support Vector Machines, Random Forests or more powerful ensembles of classifiers. Feature selection techniques, designed to detect the smallest subset of features that allow to train a leaner classification model without a significant accuracy loss, are also considered. The proposed recognition method, tested on subsets of the American Sign Language and experimentally validated, reported very high accuracies. The results showed also how higher accuracies are obtainable by combining proper sets of complementary features and using ensembles of classifiers. Moreover, it is worth noticing that the proposed approach is not sensor dependent, that is, the recognition algorithm is not bound to a specific sensor or technology adopted for the depth data acquisition. Eventually, the gesture recognition algorithm is able to run in real-time even in absence of a thorough optimization, and may be easily extended in a near future with novel descriptors and the support for dynamic gestures

    Stereo Vision and Scene Segmentation

    Get PDF
    This chapter focuses on how segmentation robustness can be improved by 3D scene geometry provided by stereo vision systems, as they are simpler and relatively cheaper than most of current range cameras. In fact, two inexpensive cameras arranged in a rig are often enough to obtain good results. Another noteworthy characteristic motivating the choice of stereo systems is that they both provide 3D geometry and color information of the framed scene without requiring further hardware. Indeed, as it will be seen in following sections, 3D geometry extraction from a framed scene by a stereo system, also known as stereo reconstruction, may be eased and improved by scene segmentation since the correspondence research can be restricted within the same segment in the left and right images

    Real-time hand gesture recognition exploiting multiple 2D and 3D cues

    Get PDF
    The recent introduction of several 3D applications and stereoscopic display technologies has created the necessity of novel human-machine interfaces. The traditional input devices, such as keyboard and mouse, are not able to fully exploit the potential of these interfaces and do not offer a natural interaction. Hand gestures provide, instead, a more natural and sometimes safer way of interacting with computers and other machines without touching them. The use cases for gesture-based interfaces range from gaming to automatic sign language interpretation, health care, robotics, and vehicle automation. Automatic gesture recognition is a challenging problem that has been attaining a growing interest in the research field for several years due to its applications in natural interfaces. The first approaches, based on the recognition from 2D color pictures or video only, suffered of the typical problems characterizing such type of data. Inter occlusions, different skin colors among users even of the same ethnic group and unstable illumination conditions, in facts, often made this problem intractable. Other approaches, instead, solved the previous problems by making the user wear sensorized gloves or hold proper tools designed to help the hand localization in the scene. The recent introduction in the mass market of novel low-cost range cameras, like the Microsoft Kinect, Asus XTION, Creative Senz3D, and the Leap Motion, has opened the way to innovative gesture recognition approaches exploiting the geometry of the framed scene. Most methods share a common gesture recognition pipeline based on firstly identifying the hand in the framed scene, then extracting some relevant features on the hand samples and finally exploiting suitable machine learning techniques in order to recognize the performed gesture from a predefined ``gesture dictionary''. This thesis, based on the previous rationale, proposes a novel gesture recognition framework exploiting both color and geometric cues from low-cost color and range cameras. The dissertation starts by introducing the automatic hand gesture recognition problem, giving an overview of the state-of-art algorithms and the recognition pipeline employed in this work. Then, it briefly describes the major low-cost range cameras and setups used in literature for color and depth data acquisition for hand gesture recognition purposes, highlighting their capabilities and limitations. The methods employed for respectively detecting the hand in the framed scene and segmenting it in its relevant parts are then analyzed with a higher level of detail. The algorithm first exploits skin color information and geometrical considerations for discarding the background samples, then it reliably detects the palm and the finger regions, and removes the forearm. For the palm detection, the method fits the largest circle inscribed in the palm region or, in a more advanced version, an ellipse. A set of robust color and geometric features which can be extracted from the fingers and palm regions, previously segmented, is then illustrated accurately. Geometric features describe properties of the hand contour from its curvature variations, the distances in the 3D space or in the image plane of its points from the hand center or from the palm, or extract relevant information from the palm morphology and from the empty space in the hand convex hull. Color features exploit, instead, the histogram of oriented gradients (HOG), local phase quantization (LPQ) and local ternary patterns (LTP) algorithms to provide further helpful cues from the hand texture and the depth map treated as a grayscale image. Additional features extracted from the Leap Motion data complete the gesture characterization for a more reliable recognition. Moreover, the thesis also reports a novel approach jointly exploiting the geometric data provided by the Leap Motion and the depth data from a range camera for extracting the same depth features with a significantly lower computational effort. This work then addresses the delicate problem of constructing a robust gesture recognition model from the features previously described, using multi-class Support Vector Machines, Random Forests or more powerful ensembles of classifiers. Feature selection techniques, designed to detect the smallest subset of features that allow to train a leaner classification model without a significant accuracy loss, are also considered. The proposed recognition method, tested on subsets of the American Sign Language and experimentally validated, reported very high accuracies. The results showed also how higher accuracies are obtainable by combining proper sets of complementary features and using ensembles of classifiers. Moreover, it is worth noticing that the proposed approach is not sensor dependent, that is, the recognition algorithm is not bound to a specific sensor or technology adopted for the depth data acquisition. Eventually, the gesture recognition algorithm is able to run in real-time even in absence of a thorough optimization, and may be easily extended in a near future with novel descriptors and the support for dynamic gestures.La recente introduzione di applicazioni 3D e monitor stereoscopici ha creato la necessità di nuove interfacce uomo-macchina. I classici dispositivi di input, come la tastiera e il mouse, non sono in grado di sfruttare appieno il potenziale di queste interfacce e non offrono un'interazione naturale. I gesti, invece, forniscono un modo più naturale e sicuro di interagire con computer e altre macchine senza doverle toccare. I campi d'applicazione per le interfacce basate sui gesti spaziano dai videogiochi al riconoscimento automatico del linguaggio dei segni, all'assistenza sanitaria, alla robotica e all'automatizzazione dei veicoli. Il riconoscimento automatico dei segni è un problema impegnativo che sta interessando la comunità scientifica da diversi anni grazie alla sua applicabilità alle interfacce naturali. I primi metodi, basati sul riconoscimento a partire da immagini o video, erano affetti dai tipici problemi che caratterizzano questo tipo di dati. Inter-occlusioni, diverso colore della pelle anche tra utenti della stessa etnia e condizioni di illuminazione instabili, infatti, hanno spesso reso questo problema intrattabile. Altri metodi, invece, hanno risolto i problemi precedenti obbligando l'utente a indossare guanti sensorizzati o ad afferrare strumenti progettati per favorire la localizzazione della mano nella scena. La recente introduzione nel mercato consumer di nuovi sensori di profondità a basso costo, come il Kinect di Microsoft, lo XTION di Asus, il Senz3D di Creative, e il Leap motion, ha aperto la strada a metodi di riconoscimento dei gesti innovativi che sfruttano l'informazione sulla geometria della scena. La maggior parte dei metodi condivide una pipeline di riconoscimento comune basata prima sull'identificazione della mano nella scena, poi nell'estrazione di opportuni descrittori dai campioni della mano e infine nell'utilizzo di opportune tecniche di apprendimento automatico per riconoscere il gesto eseguito all'interno di un ``dizionario dei gesti'' predefinito. Questa tesi, basata sul fondamento precedente, propone un nuovo sistema di riconoscimento dei gesti che sfrutti descrittori sia sul colore sia sulla geometria della scena estratti dai dati provenienti da un sensore di profondità a basso costo. La tesi comincia con l'introduzione del problema del riconoscimento automatico dei gesti, mostrando una panoramica sugli algoritmi allo stato dell'arte e sulla filiera di riconoscimento adottata. Poi, la tesi descrive brevemente i sensori di profondità a basso costo principali e i sistemi usati in letteratura per l'acquisizione di informazioni sul colore e sulla profondità per scopi di riconoscimento dei gesti, evidenziando le loro potenzialità e i loro limiti. In seguito la tesi analizza con maggiore dettaglio i metodi impiegati rispettivamente per la localizzazione della mano nella scena ripresa e la sua segmentazione nelle parti rilevanti. L'algoritmo prima sfrutta l'informazione sul colore della pelle e alcune considerazioni sulla geometria della mano per rimuovere i campioni riferiti allo sfondo, poi localizza accuratamente le regioni del palmo e delle dita e rimuove la regione del braccio. Per la localizzazione del palmo, il metodo fitta il più grande cerchio inscrivibile nella regione del palmo o un'ellisse. Un insieme di feature robuste sul colore e sulla geometria che possono essere estratte dalle regioni del palmo e delle dita, segmentate in precedenza, è poi descritto con accuratezza. Le feature sulla geometria descrivono proprietà del bordo della mano come le sue variazioni di curvatura, le distanze nello spazio 3D o nel piano immagine dei suoi punti dal centro della mano o dal palmo, o estraggono informazioni rilevanti sulla morfologia del palmo e dagli spazi vuoti nel suo guscio convesso. Le feature sul colore sfruttano, invece, gli algoritmi histogram of oriented gradients (HOG), local phase quantization (LPQ) e local ternary patterns (LTP) per ottenere altre informazioni rilevanti sulla tessitura della mano o sulla mappa di profondità trattata come un'immagine in scala di grigi. Feature aggiuntive estratte dai dati provenienti dal Leap Motion completano la caratterizzazione dei gesti per un riconoscimento più affidabile. Inoltre, la tesi descrive anche un nuovo approccio che sfrutta unitamente i dati sulla geometria provenienti dal Leap Motion e quelli sulla profondità provenienti da un sensore di profondità per l'estrazione degli stessi descrittori della profondità con un impegno computazionale inferiore. Questo lavoro in seguito affronta il delicato problema della costruzione di un modello di riconoscimento dei gesti robusto dalle feature descritte in precedenza, usando Support Vector Machines, Random Forests o più potenti insiemi di classificatori. Sono anche considerate tecniche di selezione delle feature per rilevare il minor sotto insieme di feature che permetta l'allenamento di un modello di classificazione senza una significativa perdita di accuratezza. Il metodo di riconoscimento dei gesti proposto, testato su sotto insiemi di segni dell'alfabeto American Sign Language e validato su dati reali, ha riportato accuratezze molto elevate. I risultati hanno anche mostrato che le accuratezze maggiori sono ottenibili con la combinazione di opportuni insiemi di feature complementari e usando insiemi di classificatori. Inoltre, è opportuno notare che l'algoritmo di riconoscimento non è legato a uno specifico sensore o tecnologia adottata per l'acquisizione di dati di profondità. Infine, l'algoritmo di riconoscimento dei gesti può essere eseguito in tempo reale anche in assenza di una completa ottimizzazione, e può essere esteso facilmente in un prossimo futuro con nuovi descrittori e con il supporto per i gesti dinamici

    Stima della traiettoria della mano a partire da dati 3D

    Get PDF
    Il lavoro descritto in questa tesi è stato svolto nell'ambito di un progetto più ampio, il cui obiettivo è la realizzazione di un'interfaccia uomo-macchina in grado di riconoscere con accuratezza i gesti di una mano compiuti da un generico utente. A differenza dei sistemi adottati per il motion-capture del corpo umano che, in genere, hanno un costo elevato e richiedono l'impiego di complessi algoritmi di analisi di dati 2D o pseudo-3D, il sistema descritto in questa sede è progettato per elaborare unicamente dati 3D. Nell'elaborato è descritta la realizzazione del sistema sopracitato, riportando i risultati, i pregi e i difetti delle scelte effettuate. Nel Capitolo 1 è illustrato il sistema sperimentale d'acquisizione, nel Capitolo 2 è discusso il problema della segmentazione e una sua possibile risoluzione, mentre il Capitolo 3 tratta la costruzione di un modello della mano, necessario alla fase di riconoscimento della posa della mano descritta nel Capitolo 4. Infine sono riportati i risultati di alcune verifiche sperimentali sull'algoritmo di stima della posa, e i relativi grafici sono raccolti nell'Appendice. Al termine sono tratte le conclusion

    Hand gesture recognition with leap motion and kinect devices

    No full text
    The recent introduction of novel acquisition devices like the Leap Motion and the Kinect allows to obtain a very informative description of the hand pose that can be exploited for accurate gesture recognition. This paper proposes a novel hand gesture recognition scheme explicitly targeted to Leap Motion data. An ad-hoc feature set based on the positions and orientation of the fingertips is computed and fed into a multi-class SVM classifier in order to recognize the performed gestures. A set of features is also extracted from the depth computed from the Kinect and combined with the Leap Motion ones in order to improve the recognition performance. Experimental results present a comparison between the accuracy that can be obtained from the two devices on a subset of the American Manual Alphabet and show how, by combining the two features sets, it is possible to achieve a very high accuracy in real-time

    Combining multiple depth-based descriptors for hand gesture recognition

    No full text
    Depth data acquired by current low-cost real-time depth cameras provide a more informative description of the hand pose that can be exploited for gesture recognition purposes. Following this rationale, this paper introduces a novel hand gesture recognition scheme based on depth information. The hand is firstly extracted from the acquired data and divided into palm and finger regions. Then four different sets of feature descriptors are extracted, accounting for different clues like the distances of the fingertips from the hand center and from the palm plane, the curvature of the hand contour and the geometry of the palm region. Finally a multi-class SVM classifier is employed to recognize the performed gestures. Experimental results demonstrate the ability of the proposed scheme to achieve a very high accuracy on both standard datasets and on more complex ones acquired for experimental evaluation. The current implementation is also able to run in real-time

    Real-time gaze estimation via pupil center tracking

    No full text
    Automatic gaze estimation not based on commercial and expensive eye tracking hardware solutions can enable several applications in the fields of human computer interaction (HCI) and human behavior analysis. It is therefore not surprising that several related techniques and methods have been investigated in recent years. However, very few camera-based systems proposed in the literature are both real-time and robust. In this work, we propose a real-time user-calibration-free gaze estimation system that does not need person-dependent calibration, can deal with illumination changes and head pose variations, and can work with a wide range of distances from the camera. Our solution is based on a 3-D appearance-based method that processes the images from a built-in laptop camera. Real-time performance is obtained by combining head pose information with geometrical eye features to train a machine learning algorithm. Our method has been validated on a data set of images of users in natural environments, and shows promising results. The possibility of a real-time implementation, combined with the good quality of gaze tracking, make this system suitable for various HCI applications
    corecore